Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better
نویسندگان
چکیده
We present a novel method for improving disambiguation accuracy by building an optimal ensemble (OE) of systems where we predict the best available system for target word using a priori case factors (e.g. amount of training per sense). We report promising results of a series of best-system prediction tests (best prediction accuracy is 0.92) and show that complex/simple systems disambiguate tough/easy words better. The method provides the following benefits: (1) higher disambiguation accuracy for virtually any base systems (current best OE yields close to 2% accuracy gain over Senseval-3 state of the art) and (2) economical way of building more effective ensembles of all types (e.g. optimal, weighted voting and cross-validation based). The method is also highly scalable in that it utilizes readily available factors available for any ambiguous word in any language for estimating word difficulty and defines classifier complexity using known properties only.
منابع مشابه
Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity
Most previous corpus-based algorithms disambiguate a word with a classifier trained from previous usages of the same word. Separate classifiers have to be trained for different words. We present an algorithm that uses the same knowledge sources to disambiguate different words. The algori thm does not require a sense-tagged corpus and exploits the fact that two different words are likely to have...
متن کاملA Broad-Coverage Word Sense Tagger
In other words, previous corpus-based WSD algorithms learn to disambiguate a polysemous word from previous usages of the same word. This has several undesirable consequences. Firstly, a word must occur thousands of times before a good classifter can be trained. There are thousands of polysemous words, e.g., 11,562 polysemous nouns in WordNet (Miller, 1990). For every polysemous word to occur th...
متن کاملA Preliminary Study on the Impact of Lexical Concreteness on Word Senses Disambiguation
Psychologists have shown that abstract words are harder to understand and often acquired later than concrete words. In this work, we study how the difficulty of automatic word sense disambiguation (WSD) might be affected by this intrinsic property of words, namely the concreteness of a word and its individual senses. We also explore the feasibility of inducing a numerical index for sense and le...
متن کاملA Preliminary Study on the Impact of Lexical Concreteness on Word Sense Disambiguation
Psychologists have shown that abstract words are harder to understand and often acquired later than concrete words. In this work, we study how the difficulty of automatic word sense disambiguation (WSD) might be affected by this intrinsic property of words, namely the concreteness of a word and its individual senses. We also explore the feasibility of inducing a numerical index for sense and le...
متن کاملIntegrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach
In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense, including part of speech of neighboring words, morphological form, the unordered set of surrounding words, local collocations, and verb-object syntactic relation. We tested our WSD program...
متن کامل